feat(script): new script to fill positive_expected_result and fixing minibug on similarityID generation in scan commands#8011
Open
cx-ricardo-jesus wants to merge 22 commits intomasterfrom
Conversation
Contributor
…tests with the same name but different extension
db10620 to
2a9202a
Compare
…itive_expected_result-file
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.








Reason for Proposed Changes
Proposed Changes
positive_expected_result.jsonthat does the work described above.generate_positive_expected_result.pydoes is cal parse_args(), which sets up a CLI with two mutually exclusive modes:--run-all: Scan every query found underassets/queries/.--queryID+--queryPath. Scan a single specific query.--run-allis passed,iter_queries()is called, which walks the entireassets/queriesdirectory tree. For every subdirectory that contains ametadata.json(a directory that contains a query, basically), it reads the "id" field and yields (query_id, query_path). The main goal of this is to give every query present in this repo. For each (query_id, query_path) pair, the functionrun_query_scans(query_id, query_path).queryIDandqueryPathare passed in the command line, the functionrun_query_scans(args.queryID, args.queryPath)is called directly for that one query.run_query_scans(query_id, query_path), has the job to discover all positive test files for the given query, run the appropriate KICS scans with the flags--experimental-queries,--bom,--enable_openapi-refs,--kcs_compute_new_simid, and then write the positive_expected_result.json output(s).Step 1: The first step is to discover the test files, and for that, it callsfind_positive_tests(query_path), which looks inside<query_path>/test/. For every entry that starts withpositive, it handles two layouts:test/positiveX.<ext>): If the entry is a regular file (e.g., positive1.tf, positive2.yaml), it creates aPositiveTestobject imported frommodels.pywith:positive<N>_<ext>(e.g., "positive1_tf")test- meaning results go totest/positive_expected_result.json.test/positiveX/): if the entry is a subdirectory (e.g,positive2/), it iterates the files inside. For each file (e.g.,positive2_1.tf):positive2_1_tf)test/positive2- meaning results go totest/positive2/positive_expected_result.json.positive2comes beforepositive10).Step 2: Set up temporary directories: A single
TemporaryDirectoryis created for the entire query run, with thepayloads/andresults/subdirectories where KICS writes its payload files and JSON scan result files, respectively.Step 3: Choose scan strategy based on test layout:
After discovering the tests,
run_query_scanschecks if the query test directory has any subdirectory-based tests.If no subdirectory tests are found (all test files in test/), the function runs two levels of scans:
run_directory_scan(query_id, all_paths, ...)with every positive file at once.passwords_and_secrets): the function run_scan is called for each positive file separately.If the query has subdirectory tests: This handles queries that have both loose files (e.g.,
positive1.tf ) and subdirectory files (e.g.,positive2/positive2_1.tf).:To run scans for all the test directory files at once, or inside a subdirectory with test files, the
run_directory_scanfunction is used. Therun_directory_scan, as mentioned before, runs a KICS scan command that targets all the files inside a test directory once. This is done inside a mirrored temporary directory underassets/queriesin order for similarityIDs to match. passed to this function are assumed to share the same parent directory (always test/ or test//). It takes the parent of the first file as src_dir, then computes its path relative to assets/queries/ to know where to mirror it inside the temp directory. It takes the parent of the first file assrc_dir, then computes its path relative toassets/queries/to know where to mirror it inside the temp directory. After that, it iterates every positive file in the list and copies each one into the mirrored temp directory, preserving only the filename(not the full path). After that, all the positive files sit together insidetarget_dir, exactly as they do insideassets/queries/.../test. After that, there is another for loop whose objective is to copy every single file that does not start withpositiveornegative, that are auxiliary files such as certificates or others that the tests depend on. After this loop, the KICS CLI command is built with the temporary directorytmp_diras the scan root and printed to stdout for traceability. After the CLI command is generated, the command runs as a subprocess. If the KICS scan exits with a code that is not inKICS_RESULTS_CODES, it prints an error.To run scans for a single positive test file, the
run_scanfunction is used. In this function, firstly, the path of the file relative toassets/queriesis computed and stored in therel_to_queriesvariable. For example, ifscan_pathis.../assets/queries/terraform/aws/s3/test/positive1.tf, thenrel_to_queriesbecomesterraform/aws/s3/test/positive1.tf. Same as above, this relative path is what will be replicated inside the temp directory, so that the KICS engine computes the same similarityID as the unit tests do. After that as above, all the auxiliary files are copied using the_copy_auxiliary_filesa after that is runs a KICS Scan command using the helper function_run_kicsas above.Step 3: After all scans complete, the function
collect_and_write_expected_results(query_path, results_dir, label_to_group)aggregates results and writes the final output files.results_dir. For each file, it looks up the label (filename without extension) inlabel_to_groupto determine which group it belongs to (testortest/<dir>). It reads the data present insidequeriesandbill_of_materials(combined intoall_findingsvariable), and for each finding extracts every file entry, converting it into anExpectedResultEntry(defined inmodels.py).Passwords and Secretsquery so, to fix this problem, thefix_secrets_query_namesfunction was created. This function readsregex_rules.json, identifies which rule IDs appear more than once, compiles the regex pattern of each affected rule, and then re-matches each affected finding against those patterns using the actual line content from the source file. Once the correct rule is identified, , the entry'squeryNameis updated accordingly. This correction step ensures that the positive_expected_result.json for passwords_and_secrets reflects the true rule that triggered each finding, thereby preventing any errors in the unit tests.passwords_and_secretsquery names, within each group, entries are deduplicated using all fields fromFIELD_ORDERvariable, using a set of tuples to remove exact duplicates that can arise when the same finding appears in both the directory scan and individual scan results.test/has only subdirectory positives), an empty test group is added. This ensures the unit tests always find atest/positive_expected_result.jsonto read.ExpectedResultEntry, which mirrors the order ofvulnerabilityCompareGo function intest/queries_test.go. This ensures the written file's order matches exactly what the unit test produces when it sorts its actual findings, so comparisons are deterministic.<query-path>/<group>/positive_expected_result.json.getFilesMetadatasWithContentinsidetest/main_test.goto have the respective SubDocumentIndex value for each file, which is used in multidocs files in .yaml samples for some queries, mirroring the same logic used for the results produced by the unit tests inside the(*Service).sink()function inpkg/kics/sink.gofile. This fixes the cases when there are samples tipically on .yaml formats that have multiple documents inside, producing different similarityIDs for CLI KICS scan commands and the results produced by the unit tests.I submit this contribution under the Apache-2.0 license.